Từ AI chuyên biệt đến các mô hình ngôn ngữ lớn tổng quát

Sự thay đổi mô hình trong Trí tuệ nhân tạo

1. Từ cụ thể đến tổng quát

Lĩnh vực AI đã trải qua một sự chuyển biến lớn về cách thức huấn luyện và triển khai các mô hình.

Mô hình cũ (Huấn luyện chuyên biệt theo nhiệm vụ): Models like early CNNs or BERT were trained for one specific goal (e.g., Sentiment Analysis only). You needed a different model for translation, summarization, etc.
Mô hình mới (Huấn luyện tập trung + Gợi ý): One massive model (LLM) learns general world knowledge from internet-scale datasets. It can then be directed to perform nearly any linguistic task simply by changing the input prompt.

2. Sự tiến hóa kiến trúc

Chỉ mã hóa (Encoder-only) (Thời kỳ BERT): Focused on understanding and classification. These models read text bidirectionally to grasp deep context but are not designed to generate new text.
Chỉ giải mã (Decoder-only) (Thời kỳ GPT/Llama): The modern standard for generative AI. These models use auto-regressive modeling to predict the next word, making them ideal for open-ended generation and conversation.

3. Các động lực chính của sự thay đổi

Học tự giám sát: Training on vast amounts of unlabeled internet data, removing the bottleneck of human annotation.
Luật mở rộng: The empirical observation that AI performance scales predictably with model size (parameters), data volume, and compute power.

Key Insight

AI đã chuyển từ các công cụ chuyên biệt sang các tác nhân tổng quát, thể hiện những khả năng nổi bật như suy luận và học trong ngữ cảnh.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.